2) Were there specific days of the week that saw higher intake volumes?

When looking at all species together, Tuesday and Wednesday have slightly higher volumes than other days.

# add the weekday variable
foo$weekday <- weekdays(foo$intake_date)

# Change ordering manually
foo$weekday_ord <- factor(foo$weekday,                                    
                          levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

# function for plotting
day_volume <- function(foo){
  
  # data frame to plot
  plot_data <- foo %>% count(weekday_ord) 
  
  # generate plot
  ggplot(plot_data) + geom_col(aes(x = weekday_ord, y=n)) +
    labs(x = 'Day', y = 'Count')+
    geom_text(aes(x = weekday_ord, label = sprintf("%.f", n), y= n), vjust=2, colour="white", size=4)
}

# plot for all species
day_volume(foo)

When looking at dogs only, they are still higher, but the trend is not apparent. Sunday is relatively lower.

# plot for dogs
day_volume(foo %>% filter(species == 'Dog'))

Which suggest that for cats we will see it more distinctly, as indeed this figure shows. More specifically, Wednesday is cat day (Thursday following), as Tuesday is calmer than all other days.

# plot for cats
day_volume(foo %>% filter(species == 'Cat'))

3) Where are there holes in the data?

hint: think about providing an analysis that a shelter operations director might be able to use to try and tell how staff are doing with proper data input.

The missing values, in decreasing order:

  1. Intake reason has 3511 missing values. 2370 of those are for strays (and the reason listed for 2276 of the 2335 strays with a reason listed is ‘STRAY’), but 1141 non-stray animals, whose reason entries are usually informative, are still missing.
  2. Intake subtype follows with 698 NAs. 398 of those are strays, but other strays do have an informative value here (mostly Field/OTC). Also, it seems like there are many common values between reason and intake_subtype (for example, eviction, owner died, Covid) which suggest some standardization could be useful there (decide what counts as a subtype and what counts as a reason, and make these categories mutually exclusive). Separating the categories and improving their data input is the main finding here.
  3. Found ZIP code has 280 NAs, whereas Finder has 160 values of ‘0’, which is better changed to a blank or ‘Unknown’ value for consistency.
  4. Date of birth has 34 NAs.

4) Surprise us! Using the data, please provide a visualization that gives a unique insight into the data.

Here’s a heat-map showing the number of Found animals per ZIP code! A few ZIP codes in the center of the city stand out and numbers relatively decrease in the outskirts. 85705 and 85706 also stand out as fairly smaller ZIP codes with high intakes (although presumably with a denser population). 114 animals also came from way outside town (85321 - a different shelter?).

# load ZIP codes geometry
geometry <- readRDS('zips.rds')

# count finders per ZIP code (did not plot that for simplicity)
finder_count <- foo %>% filter(src_finders_zip_code!=0) %>% group_by(zip=src_finders_zip_code) %>% 
  summarise(count = length(src_finders_zip_code), .groups='keep')

# count found per ZIP code
found_count <- foo %>% group_by(zip=src_found_zip_code) %>% 
  summarise(count = length(src_found_zip_code), .groups='keep')

# merge the counts and name properly
countDF <- inner_join(finder_count,found_count, by='zip')
colnames(countDF) = c('zip', 'countFinder', 'countFound')
countDF$zip = as.character(countDF$zip)

# merge with the ZIP code geometry
count_sf <- geometry %>% inner_join(countDF, by = "zip")


fix_sf <- function(old_sf){
  # transform the shapefile to the correct projection to make leaflet happy
  return(st_transform(old_sf, '+proj=longlat +datum=WGS84')) 
}  

# create color palette
pal <- colorBin(palette='Purples', domain = count_sf$countFound, bins = c(0, 50, 100, 200, 500, 1000))

# create tooltip label
label <- sprintf("<strong>%s</strong><br/>%g %s", count_sf$zip, count_sf$countFound, 'Found Animals') %>% 
  lapply(htmltools::HTML)

# create map
leaflet() %>%
  addTiles() %>%
  setView(lat = 32.2239217, lng = -110.917225, zoom=8) %>% 
  addPolygons(data=fix_sf(count_sf), group='Found', fillColor=~pal(countFound),
              fillOpacity = 0.7, color='grey', weight = 1, opacity = 0.4, label = label,
              highlightOptions = highlightOptions(color = "black",weight = 2, bringToFront = TRUE)) %>%
  addLegend(pal = pal, values = count_sf$countFound, opacity = 0.7, title = 'Found Animals',
                            position = "bottomright", group='Found')